System and method for supporting guaranteed multi-point delivery in a distributed data grid

ABSTRACT

A system and method can support guaranteed multi-point message delivery in a distributed data grid. A messaging facility in the distributed data grid can receive an incoming message that is adaptive to be delivered to a plurality of nodes in the distributed data grid. The messaging facility can deliver the incoming message to the plurality of nodes according to an order in a list. Furthermore, a node in the plurality of nodes operates to skip a next node in the list to deliver the incoming message, when the next node is dead or unavailable.

CLAIM OF PRIORITY

This application claims priority on U.S. Provisional Patent ApplicationNo. 61/714,100, entitled “SYSTEM AND METHOD FOR SUPPORTING A DISTRIBUTEDDATA GRID IN A MIDDLEWARE ENVIRONMENT,” by inventors Robert H. Lee, GeneGleyzer, Charlie Helin, Mark Falco, Ballav Bihani and Jason Howes, filedOct. 15, 2012, which application is herein incorporated by reference.

COPYRIGHT NOTICE

A portion of the disclosure of this patent document contains materialwhich is subject to copyright protection. The copyright owner has noobjection to the facsimile reproduction by anyone of the patent documentor the patent disclosure, as it appears in the Patent and TrademarkOffice patent file or records, but otherwise reserves all copyrightrights whatsoever.

CROSS-REFERENCED APPLICATIONS

The current application hereby incorporates by reference the material inthe following patent applications:

U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FORPROVIDING PARTITION PERSISTENT STATE CONSISTENCY IN A DISTRIBUTED DATAGRID,” by inventors Robert H. Lee and Gene Gleyzer, filed ______(Attorney Docket No.: ORACL-05359US0).

U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FORPROVIDING TRANSIENT PARTITION CONSISTENCY IN A DISTRIBUTED DATA GRID,”by inventors Robert H. Lee and Gene Gleyzer, filed ______(AttorneyDocket No.: ORACL-05359US1).

U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FORSUPPORTING ASYNCHRONOUS MESSAGE PROCESSING IN A DISTRIBUTED DATA GRID,”by inventor Gene Gleyzer, filed ______ (Attorney Docket No.:ORACL-05360US0).

U.S. patent application Ser. No. ______, entitled “SYSTEM AND METHOD FORSUPPORTING OUT-OF-ORDER MESSAGE PROCESSING IN A DISTRIBUTED DATA GRID,”by inventors Mark Falco and Gene Gleyzer, filed ______ (Attorney DocketNo.: ORACL-05364US0).

1. Field of Invention

The present invention is generally related to computer systems, and isparticularly related to a distributed data grid.

2. Background

Modern computing systems, particularly those employed by largerorganizations and enterprises, continue to increase in size andcomplexity. Particularly, in areas such as Internet applications, thereis an expectation that millions of users should be able tosimultaneously access that application, which effectively leads to anexponential increase in the amount of content generated and consumed byusers, and transactions involving that content. Such activity alsoresults in a corresponding increase in the number of transaction callsto databases and metadata stores, which have a limited capacity toaccommodate that demand.

This is the general area that embodiments of the invention are intendedto address.

SUMMARY

Described herein are systems and methods that can support guaranteedmulti-point message delivery in a distributed data grid. A messagingfacility in the distributed data grid can receive an incoming messagethat is adaptive to be delivered to a plurality of nodes in thedistributed data grid. The messaging facility can deliver the incomingmessage to the plurality of nodes according to an order in a list.Furthermore, a node in the plurality of nodes operates to skip a nextnode in the list to deliver the incoming message, when the next node isdead or unavailable.

BRIEF DESCRIPTION OF THE FIGURES

FIG. 1 is an illustration of a data grid cluster in accordance withvarious embodiments of the invention.

FIG. 2 is an illustration of supporting guaranteed multi-point messagedelivery in a distributed data grid in accordance with variousembodiments of the invention.

FIG. 3 is an illustration of creating a chain request message in adistributed data grid in accordance with various embodiments of theinvention.

FIG. 4 is an illustration of handling a chain request message in adistributed data grid in accordance with various embodiments of theinvention.

FIG. 5 is an illustration of supporting consistent message delivery whena primary owner node of a partition becomes unavailable in a distributeddata grid in accordance with various embodiments of the invention.

FIG. 6 illustrates an exemplary flow chart for supporting guaranteedmulti-point message delivery in a distributed data grid in accordancewith an embodiment of the invention.

DETAILED DESCRIPTION

Described herein are systems and methods that can support guaranteedmulti-point message delivery in a distributed data grid.

In accordance with an embodiment, as referred to herein a “distributeddata grid”, “data grid cluster”, or “data grid”, is a system comprisinga plurality of computer servers which work together to manageinformation and related operations, such as computations, within adistributed or clustered environment. The data grid cluster can be usedto manage application objects and data that are shared across theservers. Preferably, a data grid cluster should have low response time,high throughput, predictable scalability, continuous availability andinformation reliability. As a result of these capabilities, data gridclusters are well suited for use in computational intensive, statefulmiddle-tier applications. Some examples of data grid clusters, e.g., theOracle Coherence data grid cluster, can store the information in-memoryto achieve higher performance, and can employ redundancy in keepingcopies of that information synchronized across multiple servers, thusensuring resiliency of the system and the availability of the data inthe event of server failure. For example, Coherence provides replicatedand distributed (partitioned) data management and caching services ontop of a reliable, highly scalable peer-to-peer clustering protocol.

An in-memory data grid can provide the data storage and managementcapabilities by distributing data over a number of servers workingtogether. The data grid can be middleware that runs in the same tier asan application server or within an application server. It can providemanagement and processing of data and can also push the processing towhere the data is located in the grid. In addition, the in-memory datagrid can eliminate single points of failure by automatically andtransparently failing over and redistributing its clustered datamanagement services when a server becomes inoperative or is disconnectedfrom the network. When a new server is added, or when a failed server isrestarted, it can automatically join the cluster and services can befailed back over to it, transparently redistributing the cluster load.The data grid can also include network-level fault tolerance featuresand transparent soft re-start capability.

In accordance with an embodiment, the functionality of a data gridcluster is based on using different cluster services. The clusterservices can include root cluster services, partitioned cache services,and proxy services. Within the data grid cluster, each cluster node canparticipate in a number of cluster services, both in terms of providingand consuming the cluster services. Each cluster service has a servicename that uniquely identifies the service within the data grid cluster,and a service type, which defines what the cluster service can do. Otherthan the root cluster service running on each cluster node in the datagrid cluster, there may be multiple named instances of each servicetype. The services can be either configured by the user, or provided bythe data grid cluster as a default set of services.

FIG. 1 is an illustration of a data grid cluster in accordance withvarious embodiments of the invention. As shown in FIG. 1, a data gridcluster 100, e.g. an Oracle Coherence data grid, includes a plurality ofcluster nodes 101-106 having various cluster services 111-116 runningthereon. Additionally, a cache configuration file 110 can be used toconfigure the data grid cluster 100.

Guaranteed Multi-Point Message Delivery

In accordance with various embodiments of the invention, a distributeddata grid can support guaranteed multi-point message delivery, which canprovide consistency in delivering messages among different cluster nodesin the distributed data grid.

FIG. 2 is an illustration of supporting consistent message delivery in adistributed data grid in accordance with various embodiments of theinvention. As shown in FIG. 2, a distributed data grid 200 can include aplurality of cluster nodes, such as node A-D 201-204.

A cluster node, e.g. node A 201, can be either the originator of aninternal message, or a recipient of an internal message. Additionally,the cluster node A 201 can also be a recipient of an incoming messagefrom a client 210. The cluster node A 201 can use a message facility 211to configure and deliver a message to different cluster nodes in thedistributed data grid 200, such as nodes B-D 202-204.

The cluster node A 201 can deliver the message to the recipient nodesB-D 202-204 in a particular order, e.g. based on a list from node B 202to node C 203 then to node D 204. Here, the message facility 211 on thecluster node A 201 can be used to assign the order to the recipientnodes B-D 202-204 in the distributed data grid 200.

Furthermore, each recipient node B-D 202-204 in the list can keep trackof how the message is to be delivered down the list. For example, arecipient node, e.g. node B 202, can detect that node C 203, which isthe next node in the list, is dead or temporarily unavailable. Then, thenode B 202 can skip the node C 203, and deliver the message directly tonode D 204, which is the node next to the node C 203 on the list.

As shown in FIG. 2, the cluster node A 201 can receive an incomingmessage from a client 210. The incoming message can include a requestfrom the client 210, and the client 210 may expect a response from thedistributed data grid 200. Then, a response message can be sent back tothe client in the reverse direction, e.g. from node D 204 to node C 203then to node B 202 before reaching node A 201. Finally, the distributeddata grid 200 can provide the response message to the client 210, afterthe incoming message is delivered to the recipient nodes B-D 202-204 andprocessed in the distributed data grid 200.

In accordance with various embodiments of the invention, the guaranteedmulti-point message delivery feature can be used for managing partitionbackups. For example, a cluster node, e.g. the node A 201, can be theowner of a partition, while nodes B-D are backup nodes for the node A201. Here, the partition can define that the value of a property x isequal to 1 (“x=1”), which can be maintained on both the primary ownernode A 201 and each backup node B-D 202-204.

Then, the node A 201 can receive a message from the client 210, whichchanges the value of the property x to 2 (“x=2”). Thus, the cluster nodeA 201 may propagate the message, “x=2”, to each backup node B-D 202-204.

As shown in FIG. 2, the cluster node A 201 can deliver the message,“x=2”, to the recipient nodes B-D 202-204 in order. Furthermore, whenthe node B 202 detects that the node C 203 is dead or unavailable, thenode B 202 can deliver the message, “x=2”, to the node D 204 directly,while skipping the node C 203. Thus, the distributed data grid 200 canmaintain a consistent view that the value of x equals to 2.

On the other hand, an alternative approach is that the cluster node A201 can deliver the message to each recipient node B-D 202-204separately, or in parallel (as shown in dotted line in FIG. 2).

Unlike the guaranteed multi-point message delivery feature as describedin the above, this alternative approach can be problematic in thescenario when a cluster node, e.g. node C 203 is dead or becomeunavailable. Since the delivery of the message, “x=2”, to the clusternode C 203 may not go through, the value of x on node C may remain to be1 without notice.

Thus, using the alternative approach, the distributed data grid 200 maynot be able to maintain a consistent view that the value of x equals to2. This alternative approach can cause inconsistency, in terms ofdetermining the value of x at a later time point, among the differentcluster nodes A-D 201-204 in the distributed data grid 200.

Furthermore, such inconsistency may not be resolved until a fullsynchronization is performed in the distributed data grid 200. The fullsynchronization in the distributed data grid 200 can be costly, since itmay require the distributed data grid 200 to stop providing services.

Additionally, the guaranteed multi-point message delivery feature can beused complimentarily with a partition versioning feature supported inthe distributed data grid 200. For example, when a new node E 205 isadded into the distributed data grid 200, the distributed data grid 200can bring the state of the newly added node E 205 current, based on thepartition versioning feature, so that the node E 205 can start receivenew messages based on the guaranteed multi-point message deliveryfeature, as described above.

Additional descriptions of various embodiments of using partitionversioning feature in a distributed data grid 200 are provided in U.S.patent application Ser. No. ______, entitled “SYSTEM AND METHOD FORPROVIDING PARTITION PERSISTENT STATE CONSISTENCY IN A DISTRIBUTED DATAGRID”, filed ______, which application is herein incorporated byreference.

Furthermore, the guaranteed multi-point message delivery feature can beused complimentarily with a poll model that is supported in thedistributed data grid 200 for processing incoming messagesasynchronizingly.

Additional descriptions of various embodiments of supportingasynchronized message processing in a distributed data grid 200 areprovided in U.S. patent application Ser. No. ______, entitled “SYSTEMAND METHOD FOR SUPPORTING ASYNCHRONIZED MESSAGE PROCESSING IN ADISTRIBUTED DATA GRID”, filed ______, which application is hereinincorporated by reference.

FIG. 3 is an illustration of creating a chain request message in adistributed data grid in accordance with various embodiments of theinvention. As shown in FIG. 3, a cluster node A 301 in the distributeddata grid 300 can create a chain request message 320, e.g. using amessaging facility 311 on the cluster node A 301.

The chain request message 320 can be either initiated by the clusternode A 301, or be created based on an incoming message 310 received bythe cluster node A 301. The chain request message 320 can include aninternal data structure 321 that stores the information about a list ofrecipient nodes that the chain request message 320 will be delivered to.

As shown in FIG. 3, the cluster node A 301 can deliver the chain requestmessage 320 to the different cluster nodes in the distributed data grid300, e.g. nodes B-C 302-203, for further processing.

FIG. 4 is an illustration of handling a chain request message in adistributed data grid in accordance with various embodiments of theinvention. As shown in FIG. 4, a cluster node, e.g. node B 402, in thedistributed data grid 400 can receive a chain request message 420 thatcontains a list of recipient nodes in an internal data structure 421.

A messaging facility 412 in the cluster node B 402 can track thedelivery of the chain request message 420 to the rest of the recipientnodes in the list 421. For example, when the cluster node B 402 detectsthat node C 403 is dead or is unavailable, the cluster node B 402 canaccess the internal data structure 421 for the list of recipient nodesand find out that node D 204 is the next node following node C 203.Therefore, the cluster node B 402 can deliver the chain request message420 to node D 404 accordingly.

FIG. 5 is an illustration of supporting consistent message delivery whena primary owner node of a partition becomes unavailable in a distributeddata grid in accordance with various embodiments of the invention. Asshown in FIG. 5, a distributed data grid 500 can include a plurality ofcluster nodes, such as node A-D 501-504.

Initially, a partition can define that the value of a property x isequal to 1 (“x=1”). The partition can be maintained on the primary ownernode A 501 and each backup node B-D 502-504. Then, the node A 501 canreceive a message from the client 510, which changes the value of theproperty x to 2 (“x=2”). Thus, the messaging facility 511 in the clusternode A 501 can propagate the message, “x=2”, to each backup node B-D202-204.

As shown in FIG. 5, when the client 510 detects that the primary ownernode A 501 is dead or unavailable, the message, “x=2”, can be eitherhaving already been delivered to the cluster node B 502, or undelivered.

When the message, “x=2”, has already been delivered to the cluster nodeB 502, the distributed data grid 500 can guarantee that the message isdelivered to the rest of nodes C-D 503-504, and can ensure a consistentview that the value of x is equal to 2. On the other hand when themessage, “x=2”, is undelivered before the primary owner node A 501becomes unavailable, the distributed data grid 500 maintains theconsistent view that the value of x is equal to 1. Thus, the distributeddata grid 500 can ensure a consistent view of the value of x client 510in either case.

Then, the distributed data grid 500 can provide a new primary ownernode, e.g. cluster node E 505, which can continue maintaining thepartition and handle subsequent incoming messages from the client 510.

Alternatively, the cluster node A 501 can deliver the message, “x=2”, toeach recipient node B-D 502-504 separately, or in parallel (as shown indotted line in FIG. 5). Unlike the guaranteed multi-point messagedelivery feature as described in the above, this alternative approachcan be problematic in the scenario when the primary owner node A 501 isdead or become unavailable. Since the delivery of the message, “x=2”, tothe different cluster nodes B-D 502-504 may not go through, the value ofx on the cluster nodes B-D 502-504 is not guaranteed to be consistent.

FIG. 6 illustrates an exemplary flow chart for supporting guaranteedmulti-point message delivery in a distributed data grid in accordancewith an embodiment of the invention. As shown in FIG. 6, at step 601, amessaging facility on a cluster node in the distributed data grid canreceive an incoming message that is adaptive to be delivered to aplurality of nodes in the distributed data grid. Then, at step 602, themessaging facility can be configured to deliver the incoming message tothe plurality of nodes according to an order in a list. Furthermore, atstep 603, a node in the plurality of nodes can skip a next node in thelist for delivering the incoming message, when the next node is dead orunavailable.

The present invention may be conveniently implemented using one or moreconventional general purpose or specialized digital computer, computingdevice, machine, or microprocessor, including one or more processors,memory and/or computer readable storage media programmed according tothe teachings of the present disclosure. Appropriate software coding canreadily be prepared by skilled programmers based on the teachings of thepresent disclosure, as will be apparent to those skilled in the softwareart.

In some embodiments, the present invention includes a computer programproduct which is a storage medium or computer readable medium (media)having instructions stored thereon/in which can be used to program acomputer to perform any of the processes of the present invention. Thestorage medium can include, but is not limited to, any type of diskincluding floppy disks, optical discs, DVD, CD-ROMs, microdrive, andmagneto-optical disks, ROMs, RAMs, EPROMs, EEPROMs, DRAMs, VRAMs, flashmemory devices, magnetic or optical cards, nanosystems (includingmolecular memory ICs), or any type of media or device suitable forstoring instructions and/or data.

The foregoing description of the present invention has been provided forthe purposes of illustration and description. It is not intended to beexhaustive or to limit the invention to the precise forms disclosed.Many modifications and variations will be apparent to the practitionerskilled in the art. The embodiments were chosen and described in orderto best explain the principles of the invention and its practicalapplication, thereby enabling others skilled in the art to understandthe invention for various embodiments and with various modificationsthat are suited to the particular use contemplated. It is intended thatthe scope of the invention be defined by the following claims and theirequivalence.

What is claimed is:
 1. A method for supporting consistent messagedelivery in a distributed data grid operating on one or moremicroprocessors, comprising: receiving an incoming message that isadaptive to be delivered to a plurality of nodes in the distributed datagrid; delivering the incoming message to the plurality of nodesaccording to an order in a list; and allowing a node in the plurality ofnodes to skip a next node in the list for delivering the incomingmessage, when the next node is dead or unavailable.
 2. The methodaccording to claim 1, further comprising: allowing the incoming messageto include a request from a client; and providing a response to theclient after the incoming message is delivered to the plurality of nodesin the distributed data grid.
 3. The method according to claim 1,further comprising: allowing the node to deliver the incoming message toa node that is next to the node that is dead or unavailable in the list.4. The method according to claim 1, further comprising: allowing eachnode in the plurality of nodes to keep track of the incoming message asthe incoming message is delivered accordingly to the list.
 5. The methodaccording to claim 1, further comprising: creating a chain requestmessage that stores information about the list in an internal datastructure based on the incoming message.
 6. The method according toclaim 5, further comprising: allowing a node in the distributed datagrid to obtain the information about the list from the internal datastructure in the chain request message.
 7. The method according to claim1, further comprising: allowing a first node in the list to be a primaryowner of a partition and other nodes in the list of nodes to be backupnodes for the first node.
 8. The method according to claim 7, furthercomprising: propagating an update of the partition from the primaryowner of the partition to the backup nodes.
 9. The method according toclaim 1, further comprising: performing a full synchronization on theplurality of nodes in the distributed data grid.
 10. The methodaccording to claim 1, further comprising: configuring a new node in thedistributed data grid to receive the incoming message.
 11. A system forsupporting guaranteed multi-point delivery in a distributed data grid,comprising: one or more microprocessors; a messaging facility in thedistributed data grid running on the one or more microprocessors,wherein the messaging facility operates to perform the steps ofreceiving an incoming message that is adaptive to be delivered to aplurality of nodes in the distributed data grid; delivering the incomingmessage to the plurality of nodes according to an order in a list; andallowing a node in the plurality of nodes to skip a next node in thelist for delivering the incoming message, when the next node is dead orunavailable.
 12. The system according to claim 11, wherein: the incomingmessage includes a request from a client, and wherein a response isprovided to the client after the incoming message is delivered to theplurality of nodes in the distributed data grid.
 13. The systemaccording to claim 11, wherein: the node operates to deliver theincoming message to a node that is next to the node that is dead orunavailable in the list.
 14. The system according to claim 11, wherein:each node in the plurality of nodes operates to keep track of theincoming message as it is delivered accordingly to the list.
 15. Thesystem according to claim 11, wherein: a first node in the plurality ofnodes operates to create a chain request message that stores informationabout the list in an internal data structure based on the incomingmessage
 16. The system according to claim 15, wherein: a second node inthe distributed data grid operates to obtain the information about thelist from the internal data structure in the chain request message. 17.The system according to claim 11, wherein: a first node in the listoperates to be a primary owner of a partition and other nodes in thelist of nodes operates to be backup nodes for the first node.
 18. Thesystem according to claim 17, wherein: an update of the partition ispropagated from the primary owner of the partition to the backup nodes.19. The system according to claim 11, wherein: a new node in thedistributed data grid is confiured to receive the incoming message. 20.A non-transitory machine readable storage medium having instructionsstored thereon that when executed cause a system to perform the stepsof: receiving an incoming message that is adaptive to be delivered to aplurality of nodes in the distributed data grid; delivering the incomingmessage to the plurality of nodes according to an order in a list; andallowing a node in the plurality of nodes to skip a next node in thelist for delivering the incoming message, when the next node is dead orunavailable.